Clustering Gene Expression Time Series Data
نویسندگان
چکیده
Efficiently and effectively finding the genes with similar behaviors from microarray data is an important task in bioinformatics community. Co-expression genes have the same behavior or are controlled by the same regulatory mechanisms. Clustering analysis is a very popular technique to group the co-expressed genes into the same cluster. One of the key issues for clustering gene expression time series data is to define the similarity between two time series. Distance measurements and correlation coefficients are commonly used similarity definitions. Two time series might be very distant, but they might be similar if a few items are dropped off from one of the two time series. In this paper, we consider this new aspect of time series similarity, denoted “shift effect,” which indicates temporal gap between two time series. For partition based clustering methods, users have to specify the target number of clusters. This is usually done by means of try-and-error to pick up a number from a large range. In order to solve this problem, we apply sequential pattern mining technique by treating time series as sequences. The number of frequent patterns is the number of target clusters. All the time series supporting a sequential pattern are the initial members of a cluster. Then, each time series is iteratively re-assigned to a suitable cluster.
منابع مشابه
Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملMultiple gene expression profile alignment for microarray time-series data clustering
MOTIVATION Clustering gene expression data given in terms of time-series is a challenging problem that imposes its own particular constraints. Traditional clustering methods based on conventional similarity measures are not always suitable for clustering time-series data. A few methods have been proposed recently for clustering microarray time-series, which take the temporal dimension of the da...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملConstrained Subspace Clustering for Time Series Gene Expression Data
For time series gene expression data, it is an important problem to find subgroups of genes with similar expression pattern in a consecutive time window. In this paper, we extend a fuzzy c-means clustering algorithm to construct two models to detect biclusters respectively, i.e., constant value biclusters and similarity-based biclusters whose gene expression profiles are similar within consecut...
متن کاملClustering Analysis of Gene Expression Time Series Data
Microarray is used to generate large amount of gene expression data and observing the differences among gene expression levels. Gene expression time series data represents the trend of gene behaviors. Clustering is a popular analysis for gene expression time series data. Genes in the same cluster have similar behavior. Cluster analysis helps people investigate the relativity among genes. We pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004